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(54) Aligning two audio signals 

(57) In a method for alining two audb signals A and B, e.g. for automatic ecfiting between recordings, repeated 
measurements are made of the simBarfty between the two signals and an optimum time offset for aligning the signals is 
detennined. Sample secttens of the two signals around the Tout' and W points chosen by the user are oulputted by facility 
1 1 and sub-sections are analysed by Fast Fourier Transfonn circuits 12, 13 to derive a corresponding series of frequency 
spectra. PeaJcs in the congelation functfen perfonned at 14 between the two spectra are detected at 15 and from the po^n 
d the peaks, the best shift to aw>*y to one of signals A. B to bring them Into tirne^ The hardware 

12-16 may be replaced by a computer or microprocessor. 
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At least one drawing origiraDy fBed was informal and the print reproc&iced here is taken from a later fBed formal copy. 



Q 

rv) 

ro 
ro 
a> 

-vl 

oo 



Best Available Copy 



f/9 



the quick brown fox jumped errm.. 



[ fox jumped over the lazy dog | 

I out point In first take 

I the quick brown fox jumped errm. . | 

I fox jumped over thfe lazy dog 
I In point In second take 



Editing facility ■ 


/ 

11 




Signal 


processor 



FioS> 



2/9 



10 



11 



Disc 
store 



Editing 
facility 



Signal A 



F\o.2 



12- 



P V 



Signal B 



FFT 



FFT 



14- 



13 



Correlator 



15- 



Peak 

detector 



16- 



1 



Peak 

position 
analyser 




-* 3001 ndwv 



6/9 




TIME OFFSET, W 

FiG.6 



7/9 



CO 
UJ 

o 

<t 
or 

ZD 

o 
o 
o 



o 
or 

UJ 



Am A>AftM 



POSITION OF PEAK 



Fio.y 



load 2 audio segments 



split audio segments into 
blocks of 256 samples 
(5.33 ms) 



compute frequency 
spectrum of each block 



set frequency band, i=0 



calculate power,P(1), in 
frequency band i,in all 
blocks of both audio 
segments 



correlate temporal 
variation of power in 
band 1 between both 
audio segments 



save position of correlation 
peak, Wmax(i) 



Increment i 



yes 




analyse peak positions, 
Wmax(i),with optional 
weighting according to P(i) 



FI6.9 



I 



calculate optimum offset 
W between audio segments 



output W to editing 
facility 



9/9 



"e 'm "e 



X 



t t f 

Re ''e 



I9(t) hhK^^ -^1G{W) 




Re Re 

I 

f(t)+1g(t) _^F(w)+16(w) 



- 1 - 



ALPamC TOD MDIO SIGMftLS IN TIME, KR EDITIMG 



Biis inwentinn relates to the field of audio recaading and 
specdfically to a method fcxr aligning ta<o audio signals in tine, for 
instance for aubaiHting tiie adjustinent of edits betMeai recordings 
to obtain a high ^lality edit without manual intervention. 

BftOCTODtaP OP THE INVEMTICK 
m recording work it is frequently necessary to edit 
together to raawe mistakes or intrusive noises, for 
exanple. Biis is traditionally carried out ty locating a suitable 
point in a first audio recording prior to the error and then finding 
the aatdiing point in a second recording of the same material. TbB 
edit is thai out bebweai these to» points, joining the 

f»rn«»r to the latter to ranove the flawed section of the material. 
The perceived quality of the resulting edited audio material is 
critically dependent on the accuracy with vtoich these two edit 
points are located. 

Otoe timing of tbe second recording relative to the first at the 
instant of the edit will determine vtether audio mat pnnl is 
repeated or lost as the edit is replayed. Otois will affect the 
eactent to MAach. tlie edit is inperoeptiMB vflien replayed. 
Traditionally the location of these edit points is carried out ty 
listening to the audio recordings at low speed and identifying the 

ejpropriate instants so as to align the tsjo recordings in time. 
Biis is a "i-^n*^ operation for i(hich considerable e « rpri wi oe is 
required. 

•ae object of the present invention is to provide a method of 
ali gning two audio recordings in tine sudi as for the purpose of 
porf"miing an edit between than therdcy eliminating the need for the 
TnaTKiai ad justnait of the timing of one reco r ding relative to tJie 
other. 
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Ihe iixventim is defined in the aHpended cdaiins to vtoidi 
reference should nov be made* 

Briefly described in its pie l eu-ed enbodinent, the invention 
uses a method of conparing the siioilarity of tsoo audio signals in a 
ittiltiplicity of frequency bands with varying time offsets between 
the two siyials. Hbe sindlarity xneasurenents are ttoa used to 
derive a measurenent of the relative tindng of the two audio signals 
and hence the time offset \&idi mist be cFplied to one of the audio 
eigp^iR to bring it into time alignment with the other. 

TOTRP Pgg CRIFnCW OF THB PRfiWIWS 
In order that the manner in ^Mdi the fn rprpi ng can be 
understood in detail, a particularly advantageous edxxiiment thereof 
will be described with reference to the acccampanying drawings/ in 
which:- 

Ftqme 1 is a rqgesentatina of the editing pnxsess indicating 
the necessary time aligmant of the two audio recordings arai the 
position of the edit between them; 

Figure 2 is a block circuit diagram of eggparatus for a l i gn i ng 
two audio recordings in time enbodying the inwention; 

F^f?"^ 3 is a rejxresentation of two audio signals varying with 

time; 

gSryir^ ^ is a representation of the frequency s p ect r a of the 
two pjgpaig varying in time; 

p^rprre 5 is a repcesentation of the power contained in a 
freqaaicy band of each of the two signal s varying with time; 

yj^jure 6 is a rqpr r^g™*'***'^"" of the correlation f u nct^im of 
tte two ftir^^'"^ represented in Figure 5; 

Figure 7 is a histogram of the position of the peaks in the 
correlation f»mr-fi»nfi (one of \Aiich is shown in Figure 6) of all the 
frequency bands of the signals; 

Figure B is a block hardware diagram of a co npir ff r -faased 
entodiment of the invention; 

Figure 9 is a flowchart illustrating tte processor operations 
in tte system of Figure 8; and 

Figure 10 illustrates tte fast Btou ri er tr ansform 
operaticm employed. 
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EEi^TTjm nii!BrRigncw OP me pKhMawEP aiBODiMans 

Overview. 

C^ Pirter an edit to join two overlaEpping audio reoocdings 
together. The first reoarding ocaitains audio tp to a point %tere, 
for eranple, a mistake was made; see Figure la. Ote seccod 
contains "v^<" starting at a point before the mistake, mntlnniTia on 
to the end; see Eigure Ih. To make the edit, the user marks ^toe 
he wants to go out of the frcst recording (the "Out point") and 
}Atsre he wants to go into the secxaod (the, "in point"), see 
Figjre Ic The edit is p ei fimTi Rf i by plaguing material from the 
first take i^* to the out point and tten material from tie secxjnd 
tfl te starting at the in poi nt . 

in practice automated edit adjustment may be carried cut in 
accordance with this invention as follows. 

The user chooses, say, the out point that he wants. He then 
roughly pppit1"nfi the in point. Ohe audio sanples around both in 
point and the out point are then analysed by calculating a 
correlatim function between the two signals. This should 
<nrfina»A Where the best natch between the two audio signals oceaars 
and hence the optimum position for the in point. Otoe required 
adjustmait is tten either matte autanatically or indicated to the 
ttser. 

The autcDHted adjustment can be ca rr ied out as follows. A 
section from each signal (see Eigure 3) Is divided into blocks of 
sanples. The power spectrum of each of the blocks of sanples is 
then ^i^'ia<-«>rt Biis produces a series of spectra of the signals 
at regular time intervals (see Figure 4). 

^ selecting the same feequency band from each of the spectra, 
the variation in the power in that frequency band as a function of 
time is detexmined (see Figure 5). 

The cQEtelation function of the taqpoacal variation of the power 
in a frequsicy band from one signal with that from the other has a 
peak. ThepositiDJKjf the peak is related to the teqporal shift 
whit*, whsj aEplied to one signal, brings it into time alignment 
with the other (for the frequency band in ^lestion); see Figure 6. 



'jbB i»-^1ti"" of the peaks of ooaxelation functions fron all the 
frequency bands are collected together Xsee Figure 7). Obe best 
shift to apply to the audio signals to bring than into time 
alignment is deduced frm this assoartanent of peak positirms. 

first Bit »^^"*^- 

figure 2 shows a suitable iBplementatini of the invention 
iTT^i.^^Tvj a disc store 10 holding the two audio recordings to be 
aligned and an editing fedLlity 11 of tacwn tjpe connected to write 
to and read from the store. Be editing f a cilit y 11 makes 
available the two signals A and B to be oonpared and suiplies than 
to two fast FtandLer uauslu m (F5T) circuits 12 and 13 respectively. 
Sudi circuits are cuum erc i ally available and exeaite a Rjurier 
trgn°f^ (or frequency analysis) en the input signal agplied 
tiiereto. A correlator 14 thai conpares the outpits of the tiro F5T 
circuits, in a mannBr described below. Ohe outpit of the 
correlator is ajplied to a peak de tector 15 viiirii in oonjunctim 
with a peak pewit i"" analeyser 16 detennines \*iere the peak lies and 
hence the anount of taiparal adjustment required to align the twj 
recordings. 

Ihe systan of Fi^ire 2 operates as follows. Oie editing 
fcciUty outputs two sectimB of audio data, one from eadi of tiie 
two signals A and B. OSpioal signal sections are ^icwn in 
Figure 3. Oypioally the sections may be 32k (32768) sanples long, 
sanpled at a sanpling rate of 48kHz, corresponding to two-thijcds of 
a second in duratini. obe sanpling will tg?pioally be to 16-l>it 
accuracy. Bach 32k sanple is then divided in tlmB into 128 blodcs 
p^r^ of 256 sanples. 

nSie FFT Hr""<-« 12, 13 then perfonn fast Fourier transfooB 
on each of the 128 blodcs of eacb of the two signals to provide for 
eacii block a frequency spectrum. Each frequeuy qectrum wUl be 
defined iy a block of 128 sanples. Figure 4 is a threes 
dimsnsional diagram Ulustrating the twj frequency spectra for two 
typical signals. Fbr each time pe r iod oorxespanding to the 
duration of one block of the signal ti» diagram provides a plot of 
power against frequaicy. Mbst of the power is at relatively low 
frequencies thou^ for signal 1 (as it is here labelled) there is a 
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notable power coninnait at a relatively hi^ frequency. Figure 4 
thus rejaresents the inputs to the oatTBlator 14. 

tbe txstrelatar 14 calculates the oarrelation function of the 
tenpocal variation in each frequency band of one signal with the 
oanesponding variation deadved from the other signal. Bach 
spectrum is stared as a 128 %gard block, and is of the farm shown in 
Figure 5. ate power in tte first spectral conpanent of eadi blodc 
thus pawviEte a neasure c£ the taiiccal variation 
spertTf^ cQBiionent, and similarly far the other subsequent spectral 
fregocency conponaits. de oarrelation f unction of these two 
tenporal variations in power amtent of the first qjectral band is 

calculated to find out where variations in power are most alike in 
the two signals. Ohe carrelation function produced is of the type 
shown in Figure 6. Bus oorrelatian is carried out hy further ITT 
circuits within the correlator 14. Sudi oarrelation is carried 
out in time for all the spectral frequency ocnponents. 

Ihe oorrelatian function is: 

F.T. {[F{w)]tG*(w)]) 
vherB F.T. denotes the Etaurier transform, the asterisk * denotes the 
conplac conjugate, and F(w) and G(w) are the Fourier transfcims of 
the two time series, i.e. the outputs of the ciraiits 12 and 13. ^ 

During the canelation process it can be beneficial to ajply 
freighting to the functions being correlated. Ohis m^ be done 
vAdle the data is in the frequency domain, i-e. after the Rjurier 
transforms of the two functicais have been calculated and one has 
been nultiplied hy the conjugate of the other, tut before the 
inverse Fourier transform is performed. 

An eaanple of such a weighting f u nr tin n is the m a f p ri ita Tdft 
squared ocjheraioe spectrum which can be considered to be a measure 

of how much tlie spectral ocnpoaaents of one function are oonsisteaat 
with those of the other f u n ction. Bie Sfpectra of the fimctions 
would be divided into segnait pairs and the mngn i t n rtp sqiaared 
coherence spectrum calailnt-^ as folloHs: 
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Where spectra F{w) and G(w) hare been divided into n B egmmt pairs:- 

ro, FL, F2 Fti, 

GO, a, G2 Gto. 

Less relevant rrrnpinpiTr^ of the fi T n rtl nre being correlated can 
be subdued by miltiplying tte frequency doraain data by the magnitixJe 
squared ooherenoe spectrum. 

Bib correlation fimction would be modified:- 

F.T. ( ' [F(w)] [G*{w) ] [nscs (w)] ) 

Bie positinp of the peak in the oorrelation fun ctio n of the tsro 
arrai^ shows by how iiiic±i one array should be t e np o ral ly offset 
relative to the other so that they fit best. 

In this way 128 plots of oarrelation against diq>laoenient are 
obtained^ oaae for each ficecjoency band. ohe peak detector 15 
detects the r^^f of each of these plots. alius 128 such peak values 
are obtained^ ani these are "plottecP as a histogram in the peak 
poFitirin analyser 16 showing hew many times a peak o co i r s at each 
displacanent, as shown in Figure 7. Ttds analyser thus determines 
the mcjst "pcpular" displaoesnaxt of the 128 values obtained for the 
different frequency bands; this value being used as the recjiiied 
diqxLaoement value. 

Preferably, rather than just increasing the value in the 
histogram by one if a peak is found, the size of the peak is added. 
Ite size of the peak flprgrv ^ cn the original s i gn a l anplitucte. 



Additinnally, weighting the different spectral bands may be 
available as an option to the user, and the range of frequency bands 
nay also be definable ty the user. in any event the peak in the 
resultant histogram is used as the shift required to bring the 
signals into tine aligtsnent for all the frequency bands considered. 

The stanSard deviation of the peak positions plotted on the 
histogram mey be calcailated to act as a oonfidenoe indioator. 

Seooand B Bb odi Tien t . 

m practice it is oonwenient to inplanent the nettod in a 
ooniuter or microprooessor as dwm in Fi^ire 8 ^tere 
purpose hardware of Figure 2 is replaced ty a signal processor 20. 
Ohe processor 20, vihidi may be a Motorola DSP 56000, operates in 
aooDTdanoe with a program which is sunmarised in the flew ciiart of 
Figure 9 vMdi wUl essentially be self e^lanatory in view of the 
description of the first enbodimatit. 

It is particularly cmvenient to undertake the fast sourier 
transfcrms, required first to produce the frequency spectrum and 
then in the correlation operation, in the following way. m this 
niethod two ECTS can be calculated at the same time ^*en the two 
signals requiring transfonning are both entirely real. One signal 
is put into the real part of the elesnents of an array of ooopleK 
nuntoers, the other into the imaginary part. When the FFT is 
perfonnBd in place on this array it pnxiuoes an array c nnl -ni mng the 
oonplex spectxa of both of the signals. Tbe two separate, 
cxaiplej?, spectra can be esitracted ficon the array, since the two 
original signfl T** were entirely real. 

Figures 10(a) and (b) d»w esooples of two -real- tiro series 
and their Fourier txauaJjun n (real and inaginary parts) . 

Figure 10(c) shows tiie effect of interchanging the real and 
imaginary parts of a "real" signal. 

Figure 10(d) shows tUe effect of adding tog et her one -real- 
signal as it is to the other -real" signal with its real and 
imaginary ocnpanents interdanged. 

A priori knowledge of the even-ness of real part of the Fourier 
transform of a purely real signal and the odd-ness of a fwrely 
imaginary signal makes it possible to extract the real and imaginary 
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parts of the Fburier transfanns of the two o rigiTwl ••real" sig n a l s 
as follows: 

Re[F(u)] = RetF.T. {f(t) + i-g(t)H + Re[F.T. {f(-t) + i.g(-t)}] 
3to[F(u)] « Ita[F.T. {f(t) + i.g(t)}l - Ito[F-T. {£(--t) + i.g(-t>)) 
Re[G(u)] = itatF.T. {f (t) + i.g(t)}l + lta[F.T. {f (-t) + i.g{-t)}] 
lta[G(u)] = -(Re[F.T. {f(t) + i-g(t)}] - Re[F,T. {f (-t) + i-g(-t)})) 
wiiere F(u) axxl G(u) deaaote the Ftmrier transfom of f (t) and g(t) 
respec±ively/ and Re[x] and 3ta[x] ctenote the real and i mg i n a ry part 
of a ocnplrac mmtiRr x respectively. 

Ohe methods described, lAiicti inwolve splitting the RignftlB into 
fr^qumcy ba ry^g and detennining their siinilarities in the separate 
fretjjfincy banis, have been found to be particularly suocessfiil in 
reliably .iripntifyii^ the anount of displaoenient required to bring 
the tuo signals into alignment. 
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1. A method far aligniiq two audio signals in tin©/ ccnprising the 
steps of: 

determining the similarity of the two signals for varying time 
offsets between them, and 

deriving fron the similarity mBasuranaots an optimam time 
offset to bring the signals into time alignment. 

2. A msthod acxxncding to claim 1, in vhich the similarity of the 
fnirfio signals is measured in a miltiplicity of feequency bands, and 
the multiplicity of similarity mBasuranents is processed to provide 
a preferred time offset. 

3. A method according to claims 1 or 2, in vMch 
similarity measuremaits betwe» any two signals the coiMCToe 
betweai those two signals is used to wei^ the similarity 
measurements of the sirpnalB^ 

4. A method according to any preceding c l aim , in vdiidi the power 
of the harmonics in the various frequency bands is used to weight 
the similarity measurements. 

5. A method according to any of claims 2 to 4, in \Mdi the 
similarity measurements are wei^ited according to frequency band. 

6. A method according to f^iAim in vdiich the si gnal s are divided 
intx) blocks, the frequency epectrum of each block is de termine d, the 
power variation with time is determined for each of a plurality of 
frequency bands, the power variation of the two signals is 
cxarrelated for each freqoency band, the peak of each aDrrelation 
function is determined, and the peak value of the peaks thus 
obtained determined to provide a desired offset* 

7. A method of ^i^gn^Tig two audio signals in time, substantially 
as herein described with reference to the drawings. 



8. i^?)aratus far aligning two audio signals in tine, ocopcisiiig: 
means for detendning tiie siinilarity of the two signals for 

varying tine offsets between thsn, and 

means for deriving frcm the similarity measurements an optinum 
t^'inf^ft offset* 

9. i^paratus aooarding to daim 8, in K*ich the sindlarity of the 
audio is measured in a imltiplicity of frequency bands, and 
the moltiplicity of similarity measurCTaots is processed to pcovixte 
a preferred time offsets 

10. i^paratus accp rHing to daim 9/ in ^Aich the power of the 
campanoits in the various frequency bands is used to weight the 
similarity measurements. 

11. J^i)aratus ar^r^jr^g to daim 9 or 10, in idiicii the similarity 
measurements are wei^ited aooarding to frequaiKy barri. 

12. l^paratus acoording to daim 8, in ^Aiich the signals are 
divided into blocks, and including means for de t e rminin g the 
fregumcy spectcum of each block, means for dete rm i nin g the power 
variation with time for each of a plurality of freqaenqsr bands, 
means for CTTTTT^Iat-^Tig the power variation of the two signals for 
each frequency band, means for detenninii^ the peak of each 
OCTTel ^^"" j&imadjon, aai neans fear de t er m i n i n g the peak valne of 
the peaks thus ctrtrainfri to poxwixte a ctesired offset. 

13. .^paratus far ^iignnrtg two aiKJio signals in time, eaibstantially 
as herein described with reference to tte drawings. 

14. A method of f^it^Ttg airlio signals,* including a l igniTig the audio 
pjgnr^l ft by a raBthod in aooordanoB with ary of claims 1 to 7. 

15. Audio signal editing cfparatus, inrlnding eiparatus for alignin g 
two audio Rignaia in aooardance with any of claims 8 to 13. 
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